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(54) METHOD FOR SCREENING FULL-LENGTH cDNA CLONES 

(57) A method for efficiently screening full-length 
cDNA clones which comprises: determining the base 
sequence in the S'-region of each clone contained in a 
cDNA library prepared by a method for constructing a 
cDNA library involving full-length ones at a high ratio; 
examining the presence/absence of initiation ATG in 
this 5*-region and the location thereof by using an origi- 
nally developed software for anticipating initiation 
codons in cDNA; thus exactly judging the pres- 
ence/absence of the initiation codon and the location 
thereof; and screening the cDNAs thus judged as carry- 
ing the initiation codon from the cDNA library Moreover, 
a cDNA library containing full-length ones at an 
extremely high ratio can be constructed by mixing the 
clones thus selected above. 
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Description 

Technical field 

5 10001 ] The present invention belongs to the field of genetic engineering, and relates to a method for screening full- 
length cDNA clones. 

Backoround Art 

10 [0002] Recently, genome projects targeting various animals, plants, and microorganisms have been in progress. 
Numerous genes have been isolated and their functions are under investigation. In order to efficiently analyze the func- 
tions of isolated genes, it is innportant to efficiently obtain cDNA clones capable of expressing complete proteins, that 
is, full-length cDNA clones. 

[0003] The followings are known as methods for constructing a full length-enriched cDNA library: the oligo capping 

75 method in which an RNA linker is enzymatically bound to Cap of mRNA (Sugano & Maruyama. Proteins, Nucleic Acids 
and Enzymes, 38: 476-481. 1993. Suzuki & Sugano. Proteins. Nucleic Acids and Enzymes, 41: 603-607, 1996. M. 
Maruyama and S. Sugano, Gene. 138. 171-174. 1994); the modified oligo capping method developed by combining the 
oligo capping method with Okayama-Berg method (S. Kato eta!.. Gene, 150, 243-250. 1994, Kato & Sekine, Unexam- 
ined Published Japanese Patent Application (JP-A) NO. Hei 6-153953, published June 3, 1994); and the linker chemi- 

20 cal-binding method in which a DNA linker is bound to Cap (N. Merenkova and D. M. Edwards, WO 96/34981 Nov. 7. ^ 
1996). the cap chemical modification method by biotin modification of Cap (R Carninci et al.. Genomics, 37, 327-336, 
1996. P. Carninci et al., DNA Research, 4, 61 -66, 1997). These are all methods to modify Cap of eukaryotic mRNA and 
to prepare a full length-enriched cDNA library A known method for constructing a full length-enriched cDNA library by 
trapping Cap is the method using Cap-binding proteins derived from yeast or Hela cells for labeling a 5'-cap site (I. 

25 Edery et al., MCB, 15. 3363-3371, 1995). Also known is Cap Finder (Clontech) that is the Cap Switch oligonucleotide 
method in which the Cap Switch oligonucleotide is annealed by C-tailing the 5* end of a first strand cDNA. 
[0004] A cDNA library constructed by these methods is rich in full-length cDNAs compared to that obtained by the 
conventional methods. However, incomplete-length clones are also contained to some extent. To efficiently analyze the 
functions of genes and to efficiently clone novel useful genes, development of methods for easily confirming whether 

30 each clone contained in a cDNA library is full-length or not has been desired. 

Disclosure of the Invention 

[0005] An objective of the present invention is to provide a method for efficiently screening full-length cDNA clones, 

35 and a method for constructing a full length-enriched cDNA library. 

[0006] The present inventors have studied to achieve the above objective and contenrplated efficiently screening 
full-length cDNAs from a cDNA library by the presence or absence of a translation initiation codon as an index based 
on the fact that a cDNA deficient in a certain 5'-region is likely to lack a translation initiation codon. whereas a full-length 
cDNA contains an initiation codon. Especifically, the inventors assumed that a full-length cDNA could be efficiently 

40 screened from a cDNA library constructed by a method for preparing a full length-enriched cDNA library. Specifically, f^jy^ 
the inventors thought that full-length cDNA clones could be efficiently isolated by constructing a cDNA library by a 
method for preparing a full length-enriched cDNA library, determining several hundreds of base pairs of a DNA nucle- 
otide sequence from the 5' end, and analyzing the presence or absence of an initiation codon in this region to screen 
the clones containing initiation codons. 

45 [0007] However, few programs for predicting an initiation site of cDNA have been developed (e.g., "A. G. Pedersen. 
Proceedings of fifth irrternational conference on intelligent systems for molecular biology. p226-233, 1997, held in 
Halkidiki. Greece! June 21-26. 1997). Though some programs for exons prediction have been developed ("Gene 
Finder". V, V. Solovyev et al., Nucleic Acids Res.. 22, 5156-5163, 1994. ''Grair Y. Xu et al.. Genet-Eng-N-Y, 16, 241- 
253, 1994). an initiation site cannot be accurately determined relying solely on these programs. 

50 [0008] The present inventors have developed a program for cDNA initiation codon prediction by themselves and 
determined nucleotide sequences of the 5'-region of clones contained in a cDNA library constructed by a method for 
preparing a full length-enriched cDNA library to examine whether an initiation codon exists in this 5'-region using this 
software program. 

[0009] More specifically, a full length-enriched cDNA library was constructed by the oligo capping method and 
55 nucleotide sequences of, the 5'-regions of some clones contained in the cDNA library were determined. Based on the 
determined sequences, the clones were divided into known and novel ones through a database search. The presence 
or absence of an initiation codon and its location in the determined nucleotide sequences of the 5'-regions were judged 
using the initiation codon prediction program. For the known clones, whether the location of the initiation codon recog- 



2 



EP 1 026 242 A1 



nized by the initiation codon prediction program coincides with that of the initiation codon in databases is examined. 
Indeed, the presence or absence and location of the initiation codon in the known clones predicted by the program coin- 
cided with the information in the databases. 

[0010] Thus, the software program developed by the present inventors can accurately recognize the presence or 
5 absence of an initiation codon and its location, and full-length cDNA clones can be efficiently screened by selecting the 
clones that are recognized to contain an initiation codon by the program from the cDNA library Moreover, a cDNA 
library extremely rich in full-length cDNAs can be constructed by combining the screened clones. 
[001 1] The present invention relates to a method for screening full-length cDNA clones from a cDNA library and a 
method for constructing a full-length cDNA library by combining cDNA clones screened by the screening method. More 
10 specificaiiy, it relates to: 

(1) A method for isolating a full-length cDNA clone, the method comprising: 

(a) determining a nucleotide sequence from the 5'-region of a cDNA clone contained in a cDNA library, 

15 (b) determining the presence or absence of an initiation codon in the nucleotide sequence determined in (a) 

using an initiation codon prediction program, and 
(c) selecting clones recognized as containing the initiation codon in (b); 

(2) The method of (1), wherein the cDNA library is constructed by a method for preparing a full length-enriched 
20 cDNA library: 

(3) The method of (1), wherein a cDNA library is constructed by a method comprising a step of modifying Cap of 
mRNA; 

(4) A method for constructing a full length cDNA library, the method comprising: 

25 (a) determining a nucleotide sequence from the 5'-region of a cDNA clone contained in a cDNA library, 

(b) determining the presence or absence of an initiation codon in the nucleotide sequence determined in (a) 
using an Initiation codon prediction program. 

(c) selecting clones recognized as containing the initiation codon in (b), and 

(d) combining the clones selected in (c); 

30 

(5) The method of (4), wherein the cDNA library is prepared by a method for constructing a full length-enriched 
cDNA library; 

(6) The method of (4), wherein the cDNA library is constructed by a method comprising a step of modifying Cap of 
mRNA; and 

35 (7) A cDNA library obtainable by the method of (4). 

[0012] The present invention is based on the inventors' findings that full-length cDNA clones can be efficiently iso- 
lated by analyzing nucleotide sequences of the 5-region of cDNAs in a cDNA library, specifically a full length-enriched 
cDNA library, by using a software program for accurately predicting a translation initiation codon, and a full length- 

40 enriched cDNA library can be constructed by combining the isolated cDNA clones. The method for screening full-length 
cDNA clones by the present invention comprises (a) determining a nucleotide sequence from the 5'-region of a cDNA 
clone contained in a cDNA librai'y. (b) determining the presence or absence of an initiation codon in the determined 
nucleotide sequence using an initiation codon prediction program, and (c) selecting clones recognized as containing 
the initiation codon. The method for constructing a full-length cDNA library of the present invention comprises, in addi- 

45 tion to above steps (a) to (c), step (d) of combining the screened clones. 

[001 3] In the method of the present invention, a "cDNA clone" whose nucleotide sequence of the 5'-region is to be 
determined is not particularly limited. Full-length cDNAs cannot be efficiently isolated from clones derived from a library 
not rich in full-length cDNAs. compared with clones derived from a full length-enriched cDNA library Therefore, a cDNA 
clone is preferably derived from a library constructed by the above-described methods for preparing a full length- 

50 enriched cDNA library, including, for example, the oiigo capping method in which an RNA linker is enzymatically bound 
to Cap of mRNA (Sugano & Maruyama, Proteins, Nucleic Acids and Enzymes, 38: 476-481, 1993, Suzuki & Sugano, 
Proteins. Nucleic Acids and Enzymes, 41: 603-607, 1996, M. Maruyama and S. Sugano, Gene, 138, 171-174. 1994), 
the modified oligo capping method developed by combining the oligo capping method with Okayama-Berg method (S. 
Kato et at. Gene. 150, 243-250, 1994, Kato & Sekine, JP-A-Hei 6-153953. June 3. 1994). the linker chemical -binding 

55 method in which a DNA linker is chemically bound to Cap (N. Merenkova and D. M. Edwards. WO 96/34981 Nov. 7, 
1996), the Cap chemical modification method in which Cap is modified with biotin (P. Carninci et aL, Genomics, 37, 327- 
336, 1996, P. Carninci et al., DNA Research. 4. 61-66, 1997), the method using Cap binding proteins drived from yeast 
or Hela cells (I. Edery et al. MCB, 15, 3363-3371, 1995), or a library prepared by Cap Finder using Cap Switch oligo- 
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nucleotide methcxi. 

[0014] A cDNA clone can be isolated from a cDNA library by standard methods described In, for example, J. Sam- 
brook, E. F Fritsch & T Maniatis. Molecular Cloning. Second Edition, Cold Spring Harbor Laboratory Press, 1989. 
[0015] A nucleotide sequence can be determined from the 5'-region of a clone by, for example, standard methods 

5 using DNA sequencing reagents and a DNA sequencer available from Applied Biosystems, etc. A whole nucleotide 
sequence of the clone dose not have to be determined, and determining about 1 ,000 nucleotides from the 5* end is suf- 
ficient. The high accuracy can be expected by determining about 500 nucleotides, even about 300 nucleotides. 
[0016] An "Initiation codon prediction program" used for analyzing a nucleotide sequence from the 5'-region of a 
clone is preferably the program developed by the present inventors as desaibed in Example 1 below. The presence or 

10 absence of an initiation codon in a determined sequence is judged by a score deduced from the results of analysis with 
the program. A cDNA clone with a high score, recognized as containing an Initiation codon in the determined sequence, 
Is usually comprised of a full-length cDNA. while one with a low score, recognized as not containing an Initiation codon 
in the determined sequence, contains an incomplete-length cDNA. Thus, a full-length cDNA can be efficiently Isolated 
by screening a cDNA from a cDNA library, judged as containing an initiation codon in the nucleotide sequence. Indeed, 

IS in one embodiment of the analysis with the program described in Example 1 below where a cDNA library with the full- 
length cDNA content of 51% was used to screen clones (the highest score was 0.94), the content of full-length clones 
among the screened clones was 71% when clones showing a score of 0.5 or higher were selected, 77% with a score 
of 0.70 or higher, 81 % with a score of 0.80 or higher, and 85% with a score of 0.90 or higher. Therefore, full-length cDNA 
clones can be screened with a high accuracy by selecting clones with high scores using the program described in 

20 Example 1 . 

[0017] Moreover, a cDNA library re-constructed by combining clones selected by the method for screening full- 
length cDNA clones of the present invention is extremely rich in full-length cDNAs compared with the parent cDN A 
library used for screening clones. By expressing whole cDNAs capable of expressing proteins in the thus-obtained 
library, a system for efficiently analyzing gene functions containing a mixture of expressed proteiris can be obtained. 
25 This system enables efficiently cloning useful genes. 

Best Mode for Carrying out the Invention 

[0018] The present invention is illustrated in detail below with reference to the following examples, but is not to be 
30 construed as being limited thereto. 

Example 1 . Preparation of a program for predicting a translation initiation codon of cDNA 

[0019] The translation initiation codon prediction program of the present invention recognizes a putative authentic 

35 initiation codon among all ATGs contained in a given cDNA sequence fragment The program predicts based on A) 
Information on similarity of given regions (several tens to several hundreds base pairs) at both sides of a putative ATG 
to translational regions and B) information on similarity of regions near a putative ATG to those near an authentic initi- 
ation codon. Characteristics of sequences in a translational region and regions near an Initiation codon are extracted 
beforehand by from information of numerous sequences whose translational and non-translational regions have been 

40 identified. The program predicts an initiation codon based on the information about the above characteristics. 

[0020] The linear discriminant analysis used In Gene Finder, a program for genomic exon prediction (Solovyev V. 
v., Salamov A. A.. Lawrence C. B. Predicting internal axons by oligonucleotide composition and discriminant analysis 
of spliceable open reading frames. Nucleic. Acids Res, 1994. 22: 5156-63), was applied to optimize prediction. In the 
linear discriminant analysis, information on some characteristics derived from data is digitizied, weighted, and then cul- 

45 culated a score. Here, a score is converted Into a probability of similarity to an initiation codon (the probability is a rate 
of correct answers obtained from data of sequences whose initiation codon has been identified). Specifically, a proba- 
bility of similarity to an Initiation codon of each ATG contained in a given cDNA sequence is output. Recognition as an 
initiation codon is determined whether a probability of similarity to an Initiation codon is above a given threshold value 
or not. A threshold value is established depending on the plan of the following analyses, that Is, depending on the extent 

50 of noises acceptable for the following analysis. For example, when 40% of noise is acceptable, a threshold value of 0.6 
can be used. A parameter of weight is determined so as to maximize the prediction system using data of sequences 
whose initiation codon has been identified as a training datum. The above information of A) and B) were each embodied 
Into the following three information and used as information about characteristics. 

55 
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A) information on similarity of given regions (several tens to several hundreds base pairs) at both sides of a putative 
ATG to translational regions 



[0021] 

1: a frequency of six nucleotide base letters contained in a sequence from ATG to a stop codon (within 300 bp 
downstream of ATG at longest) 

2: discrepancy of the Information on a frequency of six nucleotide base letters contained in 50 nucleotide bases 
upstream and downstream of ATG 
10 3: an index of similarity to a signal peptide [a hydrophobicity index of the most hydrophobic eight amino acids letters 
among 30 amino acids (90 nucleotide bases) downstream of ATG] 

B) information on similarit y of regions near a putative ATG to those near an authentic Initiation codon 
15 [0022] 

1 : Information on a weighted matrix as using three nucleotide base letters In the region from 14 nucleotide bases 
upstream of ATG to 5 nucleotide bases downstream of ATG as a unit 

2) the presence or absence of other ATGs upstream of ATG in a same frame (the presence Is 1 and the absence 
20 IsO) , 

3: a frequency of cytosine contained in the region from 36 bases upstream of ATG to 7 bases downstream of ATG. 

Example 2: Preparation of cDNA by the oligo capping method and analysis thereof by the program for Initiation codon 
prediction 

25 

[0023] A cDNA library was prepared by the oligo capping method and the plasmid DNA was extracted from each 
clone by the standard method. Specifically, mRNA was extracted from human placenta and human cultured cells (Tetra- 
tocardnoma NT-2 and neuroblatoma SK-N-MC) by the method described in the reference (J. Sambrook. E. R, Fritsch 
& T Maniatis, Molecular Cloning. Second Edition, Cold Spring Harbor Laboratory Press. 1989). An oligo cap linker 

30 (SEQ ID NO. 1) with an oligo dT adaptor primer (SEQ ID NO, 2) in the case of Tables 1 & 2, or with a random adaptor 
primer (SEQ ID NO. 3) in the case of Tables 3 & 4 were subjected to BAP treatment, TAP treatment. RNA ligation, syn- 
thesis of a first strand cDNA. and removal of RNA according to the methods described in the references (Suzuki & Sug- 
ano. Proteins. Nucleic Acids, and Enzymes. 41, 603-607. 1996. p606, Y Suzuki et aL, Gene, 200, 149-156. 1997). The 
first strand cDNA was then converted Into the double-stranded DNA by PCR, digested with SF/I, and cloned into vec- 

35 tors, such as pMEISSCG. pMFL etc. digested with Dralll In the determined direction (Sugano & Maruyama, Proteins, 
Nucleic Acids, and Enzymes, 38, 472-481. 1993. p480). The obtained DNA was subjected to the sequencing reaction 
using a DNA sequencing reagent (DyeTerminatoir Cycle Sequencing FS Ready Reaction Kit, PE Applied Biosystems) 
following the manual and sequenced with a DNA sequencer (ABIPRISM 377. PE Applied Biosystems). The DNA 
sequence of the 5'-region of each clone was analyzed once. 

40 [0024] The presence or absence of an Initiation codon in the DNA sequence of each clone was analyzed using the 
developed program for cDNA Initiation codon prediction (ATGpr). In this analyzing program, the higher the score is, the 
higher the probability of being an Initiation codon is. The maximum score is 0.94. 

(1) Analysis of translation Initiation codons in the clones whose open reading frames are known in database among 
45 cDNA prepared by the oligo capping method 

[0025] Among the results for all analyzed clones, the result for the clones that are known to contain the initiation 
codon in the determined sequences In databases (F-NT2RP1 000020, F-NT2RP1 000025, F-NT2RP1000039, and F- 
NT2RP1 000046) are shown in Table 1. F-NT2RP1 000020 (880 bp) has 96% identity at nucleotide positions 88 to 690 

50 to "human neuron-specific gamma-2 enolase" (GenBank accession No. M22349); F-NT2RP 1000025 (645 bp), 97% 
homology at positions 29 to 641 to "human alpha-tubulin mRNA" (GenBank accession No. K00558); F-NT2RP 1000039 
(820 bp). 96% identity at positions 12 to 820 to "human mRNA for elongation factor 1 alpha subunit (EF-1 alpha) (Gen- 
Bank accession No. X03558); and F-NT2Rp 1000046 (788 bp), 97% Identity at positions 3-788 to "human M2-type pyru- 
vate kinase mRNA" (GenBank accession No. M23725). The sequences of the 5'-region in these clones are shown in 

55 SEQ ID Nos: 4, 5, 6. and 7. 
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Table 1 





F-NT2RF 


* 1000020 


F-NT2RP1 000025 


F-NT2RP 1000039 


F-NT2RP 1000046 


ATG No. 


Location 

OT Ml va 


ATGpr 
Score 


Location 
of ATG 


ATGpr 
Score 


Location 
of ATG 


ATGpr 
Score 


Location 
of ATG 


ATGpr 
Score 


1 


•1 
1 


u.uo 


96 


(0.94) 


65 


(0.90) 


111 


(0.94) 


2 


162 


(0.84) 


148 


0.13 


154 


0.05 


174 


0.82 


3 


292 


0.05 


193 


0.05 


209 


0.11 


198 


0.19 


4 


313 


0.05 


201 


0.09 


231 


0.05 


300 


0.16 


5 


441 


0.05 


232 


0.05 


321 


0.05 


315 


0.11 



Note 1 : ( ) means translation initiation codon 
Note 2: Location of ATG means the nucleotide base position of ATG in the 5'-region of a DNA sequence. 
ATG No, means the number of ATG from the 5'-region of a DNA sequence. 



[0026] As show in Table 1. among the cDNA prepared by the oligo capping method, the full-length clones whose 
open reading frames are known in databases, containing initiation codons were accurately recognized by the initiation 
codon prediction program (ATGpr) (coincident with the initiation codons in databases). 

(2) Analysis of initiation codons in the clones whose open reading frames are known in database among cDNA pre- 
pared by the oligo capping method 

[0027] Among the results for the clones analyzed, the results for the clones whose initiation codon Is known to 
absent in the determined sequence in databases (F-NT2RP1000013. F-NT2RP1000054 and F-NT2RP1000122) are 
shown in Table 2. F.NT2RP1000013 (608 bp) has 97% identity at positions 1 to 606 to "human nuclear matrix protein 
55 (nmt55) mRNA" (GenBank accession No.U89867): F-NT2RP 1000054 (869 bp). 96% identity at positions 1 to 869 
to "human signal recognition particle (SRP54) mRNA** (GenBank accession No. U51920); and F-NT2RP1000122 (813 
bp). 98% identity at positions 1 to 813 to "H. sapiens mRNA for 2-5A binding protein-* (GenBank accession No. 
X76388). The sequences of the 5' region of these clones are shown in SEQ ID Nos: 8, 9. and 10. 



Table 2 



ATG No. 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 



F-NT2RP1000013 



Location of 
ATG 



21 

27 

32 

56 
119 
125 
141 
155 
161 
176 
203 
290 ' 



ATGpr Score 



0.05 
0.05 
0.32 
0.11 
0.10 
0.08 
0.05 
0.06 
0.06 
0.08 
0.07 
0.20 



FNT2RP1000054 



Location of ATG 



31 
60 
87 
97 
146 
172 
180 
218 
272 
319 
346 
363 



ATGpr Score 



0.12 
0.20 
0.05 
0.05 
0.05 
0.05 
0.11 
0.07 
0.05 
0.07 
0.05 
0.07 



F-NT2RP1000122 



Location of ATG 



23 
100 
166 
235 
316 
346 
406 
431 
469 
546 
553 
574 



ATGpr Score 



0.07 
0.05 
0.05 
0.06 
0.05 
0.05 
0.05 
0.05 
0.06 
0.12 
0.05 
0.05 
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Table 2 (continued) 

FNT2RP 1000054 



10 




IS 



20 



f-m» too-, i„ daa^es »,d w«* do no, =onS",^«a,ta S? °^ 
(3, A„.«. o„„,«on codon. ^ .on. an,o„, eONA ^ap^ed ^ ^He o„o c^p,„, 



25 



30 



35 



40 



45 



50 



F-ZRV6C1000408 
ATG Location ATGpr 



Table 3 



F-ZRV6C1000454 



No. 
1 
2 
3 
4 
5 
6 



of ATG 
85 
208 
386 
518 
545 



Score 
<0.94> 
0.22 
0.06 
0.11 
0.05 



Location 


ATGpr 


Location 


ATGpr 


of ATG 


Score 


rfATG 


Score 


5 


0.05 


162 


<0.86> 


107 


<0.87> 


182 


0.05 


153 


0.05 


207 


0.08 


201 


0.08 


244 


0.05 


211 


0.05 


262 


0.05 


236 


0.07 


303 


0.11 



(cont'd) 



ATG 


^^^v. 

F-ZRV6C 1000615 


F-ZRV6C1000670 


Location 


ATGpr 


, Location 


ATGpr 


No. 


of ATG 


Score 


of ATG 


Score 


1 


85 


<0.94> 


120 


<0.94> 


2 


208 


0.26 


187 


0.54 


3 


386 


0.05 


312 


0.06 


4 


518 


0.09 


388 


0.05 


5 


545 


0.05 


445 


0.05 



55 
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[0031] In addWon among the results for analyzed clones the results for novel clones predicted as not containing 
m,bat.on codons F-ZRV6C1001410. F-ZRV6C1001 197. and F.ZRV6C1001472) are shown in Table 4. The sZences 
of the 5' region of these clones are shown in SEQ ID Nos: 16. 1 7 and 1 8. sequences 



Table 4 



10 





F-7R\/fir 
1 i.nvDV. 




F-2RV6C1001197 


F-ZRV6C1001472 


ATG No. 


ATG 


Ai ;jpr ocore 


Location of ATG 


ATGpr Score 


Location of ATG 


ATGpr Score 


1 


23 


0.05 


5 


0.24 


77 


0.25 


2 


31 


0.07 


141 


0.25 


126 




3 


71 


0.06 


202 


0.05 


149 


u.ud 


4 


178 


0.05 


219 


0.05 


194 


0.05 


5 


214 


0.05 


228 


0.05 


213 


0.22 


6 










249 


0.05 


7 










338 


0.09 


8 










344 


0.05 


9 










351 


0.05 


10 










365 












0.05 



75 



20 



25 



^nS nn -^/v""^ 'V^^t' ^■^'^^601001410. F-ZRV6C1001197. and F-ZRV6C1001472 were recognized as not 
containing initiation codons. These clones were thus judged as incomplete-length clones. 

30 Industrial AoDllcability 

* IT^ present invention provides a method for efficiently selecting full-length cDNAs. Clones selected by the 
TrSi^ln th/,'''^' -nvention can express complete proteins. Therefore, the present invention enables efficiently 
analyzing the functions of isolated genes. ' 

35 



40 



45 



50 



55 
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SEQUENCE LISTING 
<U0> Helix Research Institute, Inc. 
<120> Method for screening full-length cDNA cl 
<130> H1-806PCT 

<150> JP 09-289982 
<1S1> 1997-10-22 

<160> 18 

<170> PatentIn version 2.0 

<210> 1 
<211> 30 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo-capping linker sequence 
<400> 1 

AGCADCGAGU CGGCCDDGUO CGCCUACnCG 

<210> 2 
<2n> 42 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligo(dT) adapter primer sequence 
<400> 2 

GCGGCTGAAG ACGGCCTATG TGGCCTTTTT TTTTTTTTTT TT 
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<210> 3 
<2ll> 32 

<2i2> m 

<213> Artificial Sequence 
<220> 

<223> Sandon adapter primer sequence 
<400> 3 

GCGGCTGAAG ACGGCCTATG TGGCCNNNNN NC 
<210> 4 



ATGCGCCCGC GCGGCCCTAT AGGCGCCTCC TCCGCCCGCC GCCCGGGAGC CGCAGCCGCC 60 

GCCGCCACTG CCACTCCCGC TCTCTCAGCG CCGCCGTCGC CACCGCCACC GCCACTGCCA 120 

CTACCACCGT CTGAGTCTGC AGTCCCGAGA TCCCAGCCAT CATGTCCATA GAGAAGATCT 180 

GGGCCCGCGA 6ATCCTGGAC TCCCGCGGGA ACCCCACAGT GGAGGTGGAT CTCTATACTG 240 

CCAAAGGTCC TTTCCGGGCT GCAGTGCCCA GTGGAGCCTC TACGGGCATC TATGAGGCCC 300 

TGGAGCTGAG GGATGGAGAC AAACAGCGH ACTTAGGCAA AGGTGTCCTG AAGGCAGTGG 360 

ACCACATCAA CTCCACCATC GCGCCAGCCC TCATCAGCTC AGGTCTCTCT GTGGTGGAGC 420 

AAGA6AAACT GGACAACCTG ATGCTGGACT TGGATGG6AC TGAGAACAAA- TCCAA6TTTG 480 

GGGCCAATCC ATCCTGGCTG TGTCTCTGGC CCTGTGTAAG GCAHGGGCAA CTGAACNGGA 540 

ACTGCCCCTG TATCGCCACA nOCTCAGCT TGGMCGGGAA CTCARACCTC ATCCTGCCTG 600 

HGCCGGCCT TCAACGTGAT CAATGGHGG CTTCTCATGC CTGGCAACAA ANCTGGCCAT 660 

TGCNGGAAn HCATGATCC TCCCCNTTGG GAAACTGAAA AACTTTCCGG AATGCCCHTC 720 

CAACTAAGTT GCAAAAGGTC TACCKATACC CCCCAAGGGG AATTCCTCCA AGGGAACAAA 780 

TNCCCGGGAA AGGAATGCCC CCCAAHUn NGGGGGAATA AAAGGTGGGC TTTGCCCCCC 840 

CATTTTCCTG GAAAAAACNA TNAAAACCCT TGGGAAACTT gfin 



20 



<211> 880 
<212> DHA 




25 



<213> Homo sapiens 



<400> 4 



<210> 5 



55 



10 
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<211> 645 
<212> DNA 

<213> Homo sapiens 
<400> 5 

TGTGCGnAC TTACCTCNAC TCnAGCTTG TCGGGGACGG TAACCGGGAC CCGGTGTCTG 60 

CTCCTGTCGC CTTCGCCTCC TAATCCCTAG CCACTATGCG TCAGTGCATC TCCATCCACG 120 

TTGGCCAGGC TGGTGTCCAH ATTGGCAATG CCTGCTGGGA GCTCTACTGC CTGGAACACG 180 

GCATCCAGCC CGATGGCCAG ATGCCAAGTG ACAAGACCAT TGGGGGAGGA 6ATGACTCCT 240 

TCAACACCTT CTTCAGTGAG ACGGGCGCTG GCAANCACGT GCCCCGGGCT GTGTTTGTAG 300 

ACTTGGAACC CACAGTCATT GATGAAGTTC GCACTGGCAC CTACCGCCAG CTCTTCCACC 360 

CTGAGCAGCT CATCNCAGGC AAGGAA6ATG CTGCCAATAA CTATGCCCGA GGGCACTACA 420 

CCATTGGCAA GGAGATCATT GACCTTGTGT TGGACCGAAT TCGCAAGCTG GCTCACCANT 480 

GCACCGGTCT TCANGGCTTC TTGGTTTTCC ACAGCTTTGG TGGGGGAACT GGnCTGGGT 540 

TCACCTCCCT GCTCATGGAA CGTCTCTCAG TTGATTATGG CAAGAAATCC AAGCTGGAGT 600 

TCTCCATTTA CCCAGCACCC CNGGTTTCCN CNGCTGTAlTr TN6AA 645 

<210> 6 

<211> 820 
<212> DMA 
<213> Ho») sapiens 

<400> 6 

CTTTTTTCGC AACGGGnTG CCGCCAGAAC ACAGGTGTCG TGAAAACTAC CCCTAAAAGC 60 

CAAAATGGGA AAGGAAAAGA CTCATATCAA CATTGTCGTC ATTGGACACG TAGAHCGGG 120 

CAAGTCCACC ACTACTGGCC ATCTGATCTA TAAATGCGGT GGCATCGACA AAAGAACCAT 180 

TGAAAAAm GAGAAGGAGG CTGCTGAGAT GGGAAAGGGC TCCTTCAAGT ATGCCTGG6T 240 

CTTGGATAAA CTGAAAGCTG AGCGTGAACG TGGTATCACC AHGATATCT CCTTGTGGAA 300 

ATTTGAGACC AGCAAGTACT ATGTGACTAT CAnOATGCC CCAGGACACA GAGACTHAT 360 

CAAAAACATG AHACAGGGA CATCTCAGGC TGACTGTGCT GTCCTGATTG nGCTGCTGG 420 

TGTTGGTGAA mGAAGCTG GTATCTCCAA GAATGGGCAG ACCCGAGAGC ATGCCCTTCT 480 

GGCTTACACA CTGGGTGTGA AACAACTAAT tGTCGGTGTT AACAAAATGG ATTCACTGAN 540- 

CCACCCTACA GCCAGAAGAA ATATGANGAA ATTGTTAAG6 AAGTCAGCAC mCATTAAG 600 

AAAATTGGCT ACAACCCCGA CACAGTANCA TTTGTGCCAA mCTGGTTG GAATGGTGAC 660 
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AACATGCTGG AACCAAHGC TAACATGCCT TGGTTCCAGG GATGGAAAAT CCCCCNTTAA 720 
GGATGGCNAT GCCATTGGAA CCCCCCTGCT TGAAGGCTCT GGANTGCATC CTAHCACCAA 780 
CTCCnCAAA nCAAAAACC CCnGCNCCC GCCTCCNCCA 840 

<210> 7 

<2ll> 788 
<212> DNA 
<213> Homo sapiens 

<400> 7 

GAGGCTGAGG CAGTGGCTCC HGCACAGCA GCTGCACGCG CCGTGGCTCC GGATCTCTTC 
GTCTTTGCAG CGTAGCCCGA GTCGGTCAGC GCCGGAGGAC CTCAGCAGCC ATGrCGAAGC 
CCCATAGTGA AGCCGGGACT GCCTTCATTC AGACCCAGCA GCTGCACGCA GCCATGGCTG 
ACACAnCCT GGAGCACAT6 TGCCGCCTGG ACATTGAnC ACCACCCATC ACACCCCGGA 
ACACTGGCAT CATCTGTACC ATTGGCCCAG CTTCCCGATC AGTGGAGACG TTGAAGGAGA 
TGATTAAGTC TGGAATGAAT GTGGCTCGTC TGAACTTCTC TCATGGAACT CATGAGTACC 
ATGCGGAGAC CATCAAGAAT CTGCGCACAG CCACGGAAAG CTTTGCTTCT GACCCCATCC 
TCTACCGGCC CGTTGCTGTG GCTCTAGACA CTAAAGGACC TGAGATCCGA ACTGGGCTCA 
TCAAGCGCAC CGGCACTGCA GAGGTGGAGC TGAAGAATGG AGCCACTCTC AAAATCACGC 
TGGATAATGC CTACATGGAA AACTCTGACG AGAACATCCT GTGGCTGGAC TACAAGAACA 
TCTGCAAGGT GGTGGAAGTG GGCAACAAGA TCTACGTGGA TGATGGGCTN ATTTCTCTCC 
AGGTGAACAC AAAGGTGCCG ACTTCCTGGG TGACNGANGT GGAAAATGGT GGCTCCTTGG 
GCNCAAGAAA GGTGTGAACT TCCTGGGGCT GCTGTGGAOT TGCCTGCTGT GTCNGAAAAA 
GACATCCA 

<210> 8 

<211> 608 
<212> DNA 

<213> Hobo sapiens 

<400> 8 • 
ACAGCCTGGC TCCTTTGAGT ATGAATATGC CATGCGCTGG AAGGCACTCA TTGAGATGGA 
GAAGCAGCAG CAGGACCAAG TGGACCGCAA CATCNAGGAG GCTCGTGAGA AGCTGGAGAT 
GGAGATGGAA GCTGCACGCG ATGAGCACCA GGTCATGCTA ATGAGACAGG ATTTGATGAG 
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GCGCCAAGAA GAACTTCGGA GGATGGAAGA GCTGCACAAC CAAGANGTGC AAAAACGAAA 240 

GCAACTGGAG CTCAGGCAGG AGGAANAGCG CAGGCGCCGT GAAGAANAGA TGCGGCGGCA 300 

GCAAGAAGAA ATGATGCGGC GACNGCAGGA AGGAHCAAG GGAACCTTCC CTGATGCGAG 360 

AGAGCAGGAG AnCGGATGG GTCNGATGGC TATGGGAGGT GCTATGGGCA TAAACNACAG 420 

ATGTGCCATG CCCCCTGCTC CTGTGCCAGC TGGTACCCCA GCTCCTCCAG GACCTGCCAC 480 

TATTATGCCG GATGGAACTT TGCGAITGAC CCCACCNACA ACTGAACGCT TTGGTCNGGC 540 

TGCTACKATG GAAKGAATTG GGGCAATTGG TGGAACTCCT CCTGCATTCN ACCGTGCAGC 600 

TCCTGGGA ggg 

<210> 9 

<211> 869 
<212> DMA 
<213> Homo sapiens 

<400> 9 

ATATTAAACT AGTGAAGCAA CTAAGAGAAA ATGTTAAGTC TGCTAnGAT CTTGAAGAGA 60 

TGGCATCTGG TCTTAACAAA AGAAAAATGA TTCAGCATGC TGTAmAAA GAAOTGTGA 120 

AGCTTGTACA CCCTGGACn AAGCCATGGA CACCCACTAA AGGAAAACAA AATGTGATTA 180 

TGTTTGnGG ATTGCAAGGG AGTGGTAAAA CAACAACATG TTCAAAGCTA GCATATTAn 240 

ACCAGAGGAA AfSGTTGGAAG ACCTGTTTAA TATGTGCAGA CACAnCAGA GCAGGGGCTT 300 

TTGACCAACT AAAACAGAAT GCTACCAAAG CAAGAATTCC ATTTTATGGA AGCTATACAG 360 

AAATGGATCC TGTCATCATT GCTTCT6AAG GAGTAGAGAA ATTTAAAAAT GAAAATTTTG 420 

AAATTATTAT TGTTGATACA AGTGGCCGCC ACAAACAA6A AGACTCTTTG TTTGAAGAAA 480 

TGCTTCAAGT TGCTAAIGCT ATACAACCTG ATAACATTGT TTATGTGATG GATGCCTCCA 540 

TTGGGCAGGC TTGTGAAGCC CAGGCTAAGG CTTTTAAAGA TAAAGTAGAT GTACCTCAGT 600 

AATAGTGACA AAACTTGATG GCCATGCAAA ANGAAGTGGT GCACTCAGTG CAGTCGCTGC 660 

CACAAAAAAT CCGATTATTT TCATTGGTAC AGGGGGAACA TATANATGAC TTTGAACCTT 720 

TCAAAAACAC AGCCTTTTAT TAACAAACTT CTTGGTATKG GCGACATTGA AAGGACTGAT 780 

AAATAAAGTC CACNAATTGA AATTTGGATG ACNATGNAAA CCCTTATTGA AAAAATTGAA 840 

ACATNCTCCA GTTTTACTn GCGAAACNT 869 

<210> 10 ■ 

<211> 813 
<212> DNA 
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<213> Homo sapiens 



<400> 10 

GTTGTGGTAT CTGTATTMC AAATGCCCCT TTCGCGCCTT ATCAATTGTC AATCTACCAA 60 

GeAACTTGGA AAAAGAAACC ACACATCGAT ATOTGCCAA TGCCTTCAAA CTTCACAGGT 120 

TGCCTATCCC TCGTCCAGGT GAAGTTTTGG GAnAGTTGG AACTAATGGT ATTGGAAAGT 180 

CAACTGCTTT AAAAATTTTA GCAGGAAAAC AAAAGCCAAA CCTTGGAAAG TACGATQATC 240 

CTCCTGACTG GCAGGAGAH HGACnATT TCCGTGGATC TGAATTACAA AATTACTTTA 300 

CAAAGATTCT AGAAGATGAC CTAAAAGCCA TCATCAAACC TCAATATGTA GACCAGATO 360 

CTAAGGCTGC AAAGGGGACA GTGGGATCTA TTnGGACCG AAAAGATGAA ACAAAGACAC 420 

AGGCAATOT ATGTCAGCAG CTTGATTTAA CCCACCTAAA AGAACGAAAT GTTGAAGATC 480 

nrCAGGAGG AGAGHGCAG AGATTTGCTT GTGCTCTCGT TTGCATACAG .AAAGCTGATA 540 

TrrrCATGTT TGATGAGCCT TCTACrrACC TAGATGTCAA GCAGCCTTTA AAGGCTGCTA 600 

TTACTATACG ATCTCTAATA AATCCAGATA GATATATCAT TGTGGTGGAA CATGATCTAA 660 

GTGTATTAGA CTATCTCTCC GACTTCATCT GCTGTTTATA TGGTGTACCA AGCGCCTATG 720 

GAATTGTCAC TATGCCTTn AGTGTTAGAA AAGGCATAAA CNTTTTTFGG ATGGGTATGT 780 

TCCAACAGAA AACTTGAHAA TCNNAAATGC NTC 813 

<210> U 



<2ll> 655 
<212> DRA 
<213> Bono s^iais 

<400> II 

AGACTCTCAC CGCAGC6GCC AGCAACGCCA GCCGTOACG CGTTCGGTCC TCCTTGGCTG 60 

ACTCACCGCC CTCGCCGCCG CACCAIGGAC GCCCCCAGGC AGGTGGTCAA CTTrGGGCCT 120 

GGTCCCGCCA AGCTGCCGCA CTCAGTGnG mGAGAIAC AAAAGGAATT ATTA6ACTAC 180 

AAAGGAGTTG GCATTAGTGT TCTTGAAATG AGTCACAGCT CATCAGATTT T6CCAAGATT 240 

ATTAACAATA CAGAGAATCT TGTGCGGGAA TTGCTAGCTG TTCCAGACAA CTATAAGGTG 300 

ATTTTTCTGC AAGGAGGTGG GTGCGGCCAG TTCAGTGCTG tCCCCHAAA CCTCATTGGC 360 

TTGAAAGCAG GAAGGTGTGC GGACTATGTG GTGACAGGAG CTTGGTCAGC TAAGGCCGCA 420 

GAAGAAGCCA AGAAGTHGC GACTATAAAT ATCGTTCACC CTAAACTTGG GAGTTATACA 480 

AAAAnCCAG ATCCAAGCAC CTGGAACCTC AACCCAHATG CCTCCTACGT GTTTTATTGC 540 

NCAAATGAAA CGGTGCATGG TGTTGANTTT GACTTTATAC CCMATGTCAA GGGAACANTA 600 

CTGCTTTGTG ACATTTTCCT CCAACTTCCT GTCCAANCCA ATTGHATGn TCCAA 655 
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<210> 12 

<211> 599 
<212> DKA 
<213> Homo ss^iens 

<400> 12 

AAACATGCGC AGGCGCCGTG TGGCACTCGG CGGTCGAAAG GGGAGTTCAA GGAGACGG6G 60 

GCGACGCGGC TGAGGGCTTC TCGTCGG6GT CGGGGCTGCA GCCGTCATGC CGGGGATAGT 120 

GGAGCTCCCC ACTCTAGAGG AGCTGAAAGT AGATGAGGTG AAAATTAGTT CTGCTGTGCT 180 

TAAACCTGCG GCCCATCACT ATG6AGCTCA ATGT6ATAAG CCCAACAAG6 AATTTATGCT 240 

CTGCCGCTGG GAANAGAAAG ATCCGAGGCG GTGCTTAGAG GAAGGCAAAC TGGTCAACAA 300 

GTGTGCTTTG GACTTCTTTA GGCAGATAAA ACGTCACT6T GCAGAGCCTT TTACAGAATA 360 

TTGGACTTGC ATTGATTATA CTGGCCAGCA GTTATTTCGT CACTGTCGCA AACAGCAGGC 420 

AAAGTTTGAC HAGTGTGTGC TGGACAAACT GGGCTGGGTG CGGCCTGACC TGGGAAAACT 480 

GTCAAAGGTC ACCAAAGTGA AAACAGATCN ACCrTTACCG GANAATCCCT ATCACTCAAG 540 

AACAAGAACG GATCCCAGCC CTGANATCNA AGGAAATCTG CANCCTGCCA CACATGGCA 599 

<210> 13 , 

<21l> 597 
<212> DHA 

<213> HoDo sapiens 
<400> 13 

ATATCCGGAG TAGACGGAGC CGCAGTAGAC GGATCCGCGG CTGCACCAAA CACTGCCCCT 60 

CGGAGCCTGG TAGTGGGCCA CAAGCCCCCA GTCCCAGAGG CGTGATTTTC TGGCATCCTT 120 

AAATCTTGTG TCAAGGAnC GHATAATAT AACCAGAAAC CATGACGGCG GCTGAGAACG 180 

TATGCTACAC GTTAATTAAC GTGCCAATGG ATTCAGAACC ACCATCTGAA ATTAGCnAA 240 

AAAATGATCT AGAAAAAGGA GATGTAAAGT CAAAGACTGA AGCTTTGAAG AAAGTAATCA 300 

TTATGATTCT GAATGGTGAA AAACTTCCTG GACTTCTGAT GACCATCATT CGTTTTGTGC 360 
TACCTCTTCA GGATCACACT ATCAAGAAAT TACTTCTGGT ATTTTGCGAG AnCTTCCTA • 420 

AAACAACTCC AGATGGGAGA CTTTTACATG AGATGATCCT TGTATCTGAT GCAIACAGAA 480 

AGGATCTTCA ACATCCTAAT GAATTTATTC NAAG6ATCTA CTCTTCGTTT TCTTTGCAAA 540 

nCAAANAAA CANAAnOCT AAAACCTTTA ATGCCANCTA TNCCTGCATT TTTGGGA 597 



? 
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<210> 14 

<2ll> 634 
<212> DNA 
<213> Homo sapiens 

<400> 14 

AGACTCTCAC CGCAGCGGCC AGGAACGCCA GCCGTTCACG CGHCGGTCC TCCTTGGCTG 60 

ACTCACCGCC CTCGCCGCCG CACCATGGAC GCCCCCAGGC AG6TGGTCAA CTTTGGGCCT 120 

GGTCCCGCCA AGCTGCCGCA CTCAGTGTTG TTA6AGATAC AAAAGGAATT ATTAGACTAC 180 

AAAGGAilTTG GCATTAGTGT TCTTGAAATG AGTCACAGGT CATCAGATTT TGCCAAGATT 240 

AnAACAATA CAGAGAATCT TGTGCGGGAA nCCTAGCTG TTCCAGACAA CTATAAGGTG 300 

ATTTTTCTGC AAGGAGGTGG GTGCGGCCAG nCAGTGCTG TCCCCTTAAA CCTCATTG6C 360 

nCAAAGCAG GAANGTGTGC GGACTATCTG GTCACAGGAG- CTTGGTCAGC TAAGGCCCCA 420 

NAANAAGCCA AGAANTTTGG GACTATAAAT ATCGTTCACC CTAAACTTGG GAGTTATACA 480 

AAAAncCAG ATCCAAGCAC CTGGAACCTC AACCCAGATG CCTCCTACGT GTATTATTGC 540 

GCNAATGAAA CNGTGCATC6 TGTGGAirrCT GACTTTATAC CCGATGTCNA GGGAACATAC 600 

TGGTTTGTGA CATGTCCTCA AACTTCCCGT CCNA 634 

<210> 15 

<211> 757 
<212> DNA 
<213> Homo sapiens 

<400> 15 

AGTCTGCGGT GGGCTANCGG ACGGTCCGGC TTCCGGCGGC CGTncrGTC TCnCCTGGC ' 60 

TGTCTCGCTG AATCGCGGCC GCCTTCTCAT CGCTCCTGGA AGGTCCCGAG CGCGACACCA 120 

TGTCGGAACC CGGGGGCGGC CGCGGCGAAG ACNGCTCGGC CGGATTGGAA GTGTCGGCCG 180 

TGCANAATGT GGCGGACGTG TCGGTGCTGC ANAAGCACCT GCGCAAGCTG GTGCC6CTGC 240 

TGCTGGAGGA CGGCGGCGAA GCGCCGGCCG CGCTGGAGGC GGCGCTGGAG 6AGAAGAGCG 300 

CCCTGGAGCA GATGCGCAAG nCCTTTCGG ACCCGCACGT CCACACGGTG CTGGTGGAGC 360 

GCTCCACGCT CAAACTGGAC GTCGGTGATG AAGGAGAAGA AGAAAAAGAA TTCATTTCCT 420 

ATAACATCAA CNTAGACATT CACTATGGGG TTAAATCCAA TAGCTTGGCA HCATTAAAC 480 

GTACTCCCGT GATTGATGCA GATAAACCCG TGTCTTCTCA NCTCCGGGTC CTTACACTCA 540 
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GTGAANACTC NCCCTACNAA AACTTTGCAT TCTTTCAm ACAATGCAGT GGCTCCTTTT 600 

TTTAANTCCT ACATTAAAAA ATCTGGCAAG GCAAACAGGG ATGGTGATAA AATGGCTCCT 660 

TCCmCAAA AAAAAATTGC -CGAACTCNAA ATNGGACTCC nCCCTTGCA NCAAAATTTT 720 
TGAAATTCCG GAAAATCAMC CTGCCCAATT CCTCCCC 



<210> 16 

<211> 300 
<212> DNA 
<213> Homo sapiens 



757 



<400> 16 

ATCATTTCCT TATTTATAIT TCATGTTGGA ATGCHAAAT CGATAACCTT TGTATmGA 60 

AGT6CGCGAC ATGGAAGGTG ATCTGCAAGA GCTGCATCAG TCAAACACCG GGGGATAAAT 120 

GTGGATnGG GnCCGGCGT CAAGGTGAAG ATAATACCTA AAGAGGAACA CTCTAAAATG 180 

CCAGAAGCAC GTGAANAGCA ACCACAAGTT TAAATGAAGA CAAGCTGAAA CAACGCAAGC 240 
TGGTTTTATA TTAGATATTT GACTTAAACT ATCTCAATAA AGTTmCAG CTTTCACCAC 300 

<210> 17 

<211> 313 
<212> DNA 
<213> Hobo sapiens 

<400> 17 

AAAGATGGCG GCGGGGGAGG TAGGCAGAGC AG6ACGCCGC TGCTGCCGCC GCCACCGCCG 60 

CCTCCGCTCC ACTCGCCTCC CGTCCTTCAA ACTCACACCT CCCGGGAGGA GCTGTCCTGG 120 

CGCCGGGTCC CGCGGGGAAA ATGGTGGAGC CAGGGCAA6A TTTACTOCTT GCTGCnTGA 180 

GTGAGAGTGG AATTAGTCCG AATGACTCTT TGATAnGAT GGTGGAGATG CAN6GCTTGC 240 

AACTCCAATG CCTACCCCGT CAGTTCAGCA NTCAGTGCCA CTTAffTGCAT TANAACTAKG 300 
TTTG6AGACC GAA 

ul3 

<210> 18 

<211> 667 
<212> DNA 
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10 



15 



20 



<213> Hono 


sapiens 




<400> 18 






ACTGCCGGGC 


TCGGCGTGAG 


TCGCTGCGGG 


GCCTGGTCAG 


ACCATAATGA 


CTTCAGCAAA 


ACAAAATGCA 


GAAGAAHAC 


AAGACTTTAT 


TAAACAAAAG 


GATATGGAAC 


TAAGAAGACA 


TATTCGAAAT 


GGGAATTTTA 


6GAAAAAGAA 


ACCANAGAGG 


AAAACACNAA 


AAACAGGATA 


CTTGATGTGG 


ACCGTATCCT 


TGATGAGCn 


TCTCAAGAAT 


CAGAGTCGGA 


AGAAGATGGG 


TTAAAAGAAA 


AGGGCNATAA 


ATACTTCCAC 


ACACNAAAGG 


CNTGGATGCC 


GATCCATATN 


CATATTTTAG 


ACTGAAAAAA. 


TTTGCTGTTG 


TGAAATA 







60 
120 
180 
240 
300 
360 
420 
480 
540 



rAAn;mA TTGmTTTAN CANTTGCCT 660 
25 667 



30 Claims 



1. 



A method for isolating a full-length cDNA done, the method comprising: 
3S S £!rm "-"^ a nucleotide sequence from the 5'-region of a cDNA clone contained in a cDNA library 

(c) selecting clones recognized as containing the initiation codon in (b). 
- ISNA^aT ^^^'^^'^^^ ^ ^ ^'"'^ - 'ength-enriched 

3. The method of claim 1 , wherein a cDNA library is constructed by a method comprising a step of modifying Cap of 

45 4. A method for constructing a full length CDNA library, the method comprising: 

S dSelmIS ^^'"r'' ^ ^ '^DNA clone contained in a cDNA library 

^^zzs:::z^'Z^Z^Z: ^" --'^^^'^^ -^-^^ 

so (c) selecting clones recognized as containing the initiation codon in (b)- and 

(d) combining the clones selected in (c). 

. ISNATaT "'"^^ ''^^'^ ^ "^^^"^ -"^^"^'"9 a full length-enriched 

55 

I'mRNr °' ^""^ '"^^^-^ ^ --P-ing a step of modifying Cap 
7. A cDNA library obtainable by the method of claim 4. 
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